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ABSTRACT 

■  This  paper  addresses  the  problem  of  screening  potential 
variables  for  entrance  in  a  linear  multiple  regression  set- 
ting.  The  purpose  of  the  work  presented  here  is  to  propose 
two  screening  methods,  both  of  which  have  roots  in  principle 
component  analysis,  and  which  evaluate  a  combination  of 
variables  in  an  efficient  enough  manner  so  that  enumeration 
of  all  combinations  is  feasible  even  when  the  number  of  po- 
tential variables  is  quite  large.   Using  the  square  of  the 
multiple  correlation  coefficient  as  the  criterion,  the  se- 
lections made  by  these  methods  in  several  test  cases  are 
evaluated,  and  compared  with  the  selections  made  by  the 
methods  of  total  enumeration  and  stepwise  regression.   The 
paper  concludes  with  overall  evaluations  of  the  two  methods 
and  suggests  directions  for  further  study. 
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I.   INTRODUCTION 

The  search  for  the  important  variables  in  a  multiple 
linear  regression  setting  is,  due  to  its  great  practical 
importance,  an  area  which  has  received  considerable  atten- 
tion.  Researchers  often  use  a  large  number  of  potentially 
important  variables  in  the  exploratory  stages  of  their  work. 
These  need  be  screened  in  order  to  determine  the  most  par- 
simonious subset  of  these  variables  available  for  predicting 
or  estimating  the  response  with  an  acceptable  level  of  er- 
ror.  It  appears  that  a  total  enumeration  of  all  combina- 
tions of  variables  [Ref.  1]  is  necessary  to  be  assured  of 
making  the  best  selection.   This  process  is  computationally 
overwhelming  when  the  number  of  variables  becomes  even  mod- 
erately large.   (Each  combination  requires  a  matrix  inver- 
sion, and  there  are  2"  such  inversions  to  perform  if  p  is 
the  number  of  variables.) 

A  highly  popular  approach  to  this  problem  is  the  use  of 
stepwise  regression  [Refs.  2,  3,  and  4].   It  is  basically 
a  one-step  look-ahead  method,  and  uses  significance  tests 
based  on  distributional  assumptions  to  judge  the  combinations 
of  variables  under  consideration  at  each  step.   It  appears 
to  do  an  adequate  job  of  selection,  and  is  readily  available 
in  packaged  form  (specifically  BMD  and  SPSS). 

The  purpose  of  the  present  work  is  to  present  and  study 
some  alternatives  to  stepwise  regression  in  hopes  of  finding 
viable  competitors.   It  is  desirable  that  these  methods 


should  perform  at  least  as  well  as  stepwise  regression,  yet 
remain  feasible  computationally.  In  this  regard  the  method 
of  principle  component  analysis  of  the  antecedent  variables 
is  useful  and  serves  as  a  guide. 

This  paper  will  pursue  the  development  of  these  alter- 
natives in  the  following  manner.   A  discussion  of  the  general 
problem  will  be  presented  first,  along  with  remarks  concern- 
ing ways  of  measuring  the  effectiveness  of  the  combinations 
of  variables.   Comments  concerning  several  suggested  screen- 
ing methods  will  follow,  being  followed  in  turn  by  a  discus- 
sion of  stepwise  regression.   A  development  of  the  alternatives 
under  consideration  in  this  paper  will  be  presented,  and  a 
comparative  evaluation  of  their  performance  on  some  test 
cases  will  conclude  the  work. 


II.   THE  LINEAR  REGRESSION  MODEL 

In  order  to  introduce  the  linear  model,  it  is  convenient 
to  agree  to  a  common  notation.   Let  y  be  the  dependent,  or 
response,  variable,  and  x-.,x~,...,x  be  the  control,  or  an- 
tecedent, variables.   Assume  that  N  sets  (y ,x, ,x_ , . . . ,x  ) 
are  observed,  and  for  convenience,  let  each  member  of  the 
set  be  replaced  with  its  deviation  from  the  sample  mean: 

7j  *  fyj"^  and  xij  *  (xij_xV  i=l»-.-»P;  j-l,...,N.   (2.1) 

The  response  y  is,  of  course,  viewed  as  random.   The  an- 
tecedent variables  may  be  either  random  or  deterministic; 
it  does  not  matter  which.   We  are  concerned  only  with  the 
question  of  which  subset  of  them  should  be  permanently  col- 
lected and  not  with  formal  statistical  inference  -per   se. 
When  means,  variances,  covariances,  and  correlations  are  in- 
troduced, they  refer  only  to  the  sample  quantities.   Further, 
no  distributional  assumptions  are  made  about  y.   Thus  de- 
cisions regarding  the  appropriateness  of  the  various  com- 
binations of  variables  are  structured  on  ad  hoc   data  analysis 

grounds  and  not  on  formal  tests. 

N 
Using  the  column  vector  Y  to  denote  the  Cy-/i,  and  the 

N 
matrix  X  for  the  N  sets  of  {(x,  ...... x    ■)}■,,    the  usual  lin- 

ear  model 

Y  =   X   3   +   e  (2.2) 

Nxl    Nxp  pxl    Nxl 

is  assumed,  where  B  represents  the  vector  of  regression  co- 
efficients and  c  the  vector  of  residuals.   The  estimation  of 


3  is  achieved  by  the  method  of  least  squares,  the  solution 
being  [Refs.  2  and  5]: 

3  =  CX'X)"1X,Y  (2.3) 

assuming  X  has  full  rank  (as  it  generally  will  in  data 
analysis  situations)  and  the  prime  denotes  transpose.   The 
correlation  matrix  of  the  x, , . . . ,x   variables  is  introduced 
as 

Cpxp=^X'X-  C2.4) 

The  computational  problem  associated  with  total  enumera- 
tion can  now  be  made  more  explicit.   Consider  a  subset  of 

size  q,  (x.  ,...,x.  )  of  (x, ,...,x  ).   There  are  (  )  such 

1       """q  P  ^ 

subsets,  and  for  each  of  these  a  covariance  matrix  (a  minor 

of  (2.4))  must  be  inverted  to  produce  the  corresponding  ° 

(Eqn.  (2.3)).   This  done,  one  must  choose  the  best  (in  some 

sense)  subset  of  size  q  and  do  this  for  each  q=l,...,p. 

A  number  of  criteria  for  judging  the  fit  of  a  subset 

of  variables  are  available,  including  multiple  correlation, 

standardized  total  squared  error  [Ref.  6],  and  variance  of 

residuals.   The  approach  taken  here  is  to  chart  the  growth 

of  the  square  of  the  multiple  correlation  coefficient,  R2 , 

as  a  function  of  q.   This  can  be  done  for  total  enumeration 

(using  that  subset  (x.  ,...,x.  )  that  maximizes  R2  for  each 

1i     1q 

q) ,  for  stepwise  regression,  and  the  two  alternatives  to  be 
introduced.   Once  such  charts  are  made,  the  user  may  choose 
q  to  meet  his  own  needs.   The  researcher  hopes  that  R2  grows 
very  rapidly  and  becomes  very  close  to  its  maximum  (achieved 
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when  q=p)  for  small  q.   The  worse  case  occurs  when  R2  grows 
linearly  with  q. 

This  approach  seems  simple  and  reasonable.   No  formal 
significance  tests  are  made  and  the  tenant  difficulties 
connected  with  simultaneous  inference  are  not  addressed. 
Although  the  method  of  stepwise  regression  uses  formal  sig- 
nificance testing  in  its  intermediate  stages,  the  results 
of  using  it  can  still  be  compared  using  our  simple  ad  hoc 
approach. 

Remark:   In  recent  work  with  the  method  of  ridge  regres- 
sion [Ref.  7]  the  use  of  unbiased  estimators  Eqn.  (2.3)  is 
foregone  and  more  general  measures  based  on  average  squared 
error  are  used.   In  this  method  the  principle  diagonal  of 
the  covariance  matrix  Eqn.  (2.4)  is  loaded  in  an  effort  to 
trade  bias  against  a  smaller  mean  squared  error.   Comparison 
of  the  methods  presented  here  with  those  of  ridge  regression 
is  not  considered  in  this  thesis. 

To  present  computational  formulae  for  the  measures  of 

effectiveness  discussed  previously,  additional  notation  is 

convenient.   Let  s  be  the  pxl  column  vector  of  covariances 

between  y  and  (x, ,. . . ,x  ) ,  and  let  c   be  the  variance  of  y. 
1  *•  1'  .   *  -pJ  '  yy  J 

The  variance  of  residulas  can  be  estimated  as 

~2_1^^1 


a 


e-e  =  J  (Y-XB)-(Y-XB)  (2.5) 


where 

e  =  Y-X3  (2.6) 

is  the  estimate  of  residuals.   The  square  of  the  multiple 
correlation  coefficient  is  [Ref.  5] 
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.r-1. 


R2  =  1  -   a2  /  c   =  s'C   s  /  c 

YY  YY 


(2.7) 


and  C  is  given  in  Eqn.  (2.4).   When  the  model  is  reduced  to 
(x.  ■,..., x-  ),  the  matrices  and  vectors  must  be  modified 

i     q 

accordingly. 

Three  test   cases  are  introduced  to  evaluate  the  methods 
under  consideration  (Tables  I  -  III)  .   There  the  pertinent 
quantities  C,  s,  and  c   are  presented  as  augmented  matrices 
in  the  format 


yy 

s 


The  first  test  case  was  generated  specifically  to  expose 
a  weakness  of  the  stepwise  approach,  as  will  be  seen.   The 
second  matrix  was  designed  to  be  of  sufficient  complexity  to 
give  a  more  enlightening  comparison  of  the  methods  under  con- 
sideration, yet  small  enough  so  that  the  total  enumeration 
solution  could  be  obtained.   Care  was  taken  to  ensure  that 
the  criterion  of  positive  definiteness  (see  Appendix  B)  was 
met . 

Because  the  first  two  examples  are  artificial  in  nature, 
an  application  using  real  data  was  sought.   Such  was  ob- 
tained from  a  study  currently  underway  [Ref.  8].   There,  a 
survey  concerning  subjective  reactions  to  a  set  of  fourteen 
drugs  is  being  made  to  ascertain  how  the  subjects  perceive 
the  various  drugs.   Each  drug  is  rated  from  one  to  seven  on 
each  of  the  following  fourteen  scales;  violence,  growth, 
sharpness,  destruction,  ehancement ,  activity,  goodness, 
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avoidance,  integration,  positivity,  permanence,  speed, 
severity,  and  strength. 

One  of  the  studies  within  this  investigation  is  to  de- 
scribe the  scale  "severity"  in  terms  of  the  other  scales. 
More  specifically,  what  subset  of  the  other  scales  "best" 
describes  severity?   This  problem  presents  an  opportunity 
to  compare  the  screening  methods  being  presented  here  with 
stepwise  regression,  and  provides  a  third  test  matrix. 
This  data  is  rounded  to  one  digit  to  conserve  space,  and 
in  this  form  may  not  be  positive  definite.   See  Table  III. 
Of  course,  the  original  matrix  was  used  in  the  computations 
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1.0  0.7  0.6  0.6 

0.7  1.0  0.6  0.6 

0.6  0.6  1.0  0.1 

0.6  0.6  0.1  1.0 


Table    I 
TEST   MATRIX   ONE 


1.0      0.6      0.1      0.4      0.3  0.5 

0.6      1.0      0.4      0.1      0.2  0.5 

0.1      0.4      1.0      0.3      0.4  0.2 

0.4      0.1      0.3      1.0      0.2  0.4 

0.3      0.2      0.4      0.2      1.0  0.3 

0.5      0.5      0.2      0.4      0.3  1.0 

Table    II 
TEST   MATRIX  TWO 


-.2  0.4  -.3  0.3  -.2  - .  3   0.2    -.4    0.2 

-.1  -.4  0.4  -.3  0.4  0.3    0.1    0.4    0.0 

-.4  -.2  0.3  -.2  0.4  0.3    0.4    0.1    0.2 

0.0  -.6  0.6  -.4  0.5  0.4    0.0    0.5    -.1 

0.3  0.5  -.4  0.5  -.4  0.2    -.1    -.3    0.1 

1.0  0.1  0.0  0.1  -.1  0.1    -.3    0.0    -.2 

0.1  1.0  - .7  0.5  -.6  -.3  0.0    - .5    0.1 

-.1  -.7  1.0  -.5  0.6  0.3   0.0    0.5    -.1 

0.1  0.5  -.5  1.0  -.5  -.30.0    -.4    0.1 

-.1  -.6  0.6  - .5  1.0  0.5    0.1    0.4    0.0 

0.1  - .3  0.3  - .3  0.3  l.ffl    -.1    0.5    - .1 

-.3  0.0  0.0  0.0  0.0  -.1    1.0    0.0    0.4 

0.0  -.5  0.5  -.4  0.4  0.S   0.0    1.0    -.2 

-.2  0.1  -.1  0.1  0.0  -  .  1    0.4    -.2    1.0 


Table  III 
TEST  MATRIX  THREE 
(Entries  rounded;  see  Ref.  8  for  usable  data) 


1.0 

-.2 

0.1 

-.4 

0.4 

*"? 

1.0 

0.4 

0.5 

-.3 

0.1 

0.4 

1.0 

0.3 

-.3 

-.4 

0.5 

0.3 

1.0 

-.4 

0.3 

-.3 

-.3 

-.4 

1.0 

*   mu 

-.1 

-.4 

0.0 

0.3 

0.4 

-.4 

-.2 

-.6 

0.5 

-  .  3 

0.4 

0.3 

0.6 

-.4 

0.3 

-.3 

-.2 

-.4 

0.4 

s 

0.4 

0.3 

0.5 

-.4 

-  .  5 

0.3 

0.1 

0.4 

-.2 

0.2 

0.1 

0.4 

0.0 

-.1 

-.4 

0.4 

0.1 

0.5 

-.  3 

0.2 

0.0 

0.2 

-  .1 

0.1 
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III.   CURRENTLY  USED  SCREENING  PROCEDURES 

There  are  a  number  of  methods  currently  used  to  screen 
variables,  among  them  forward  selection,  backwards  elimina- 
tion, stepwise  regression,  and  several  graphical  techniques 
[Ref .  6] .   Because  stepwise  regression  incorporates  the  best 
of  two  of  the  above  methods,  enjoys  general  acceptance,  and 
is  readily  available  as  a  packaged  program,  it  will  serve 
as  a  baseline  for  measuring  the  performance  of  the  methods 
under  study  here,  and  will  be  discussed  in  greater  detail. 

Stepwise  regression  is  based  on  an  underlying  assumption 
of  normality  for  the  response  variable  y.   As  a  result  of 
this,  sums  of  squares  from  several  sources  (including  re- 
sidual error,  regression,  and  reduced  model)  have  Chi-Square 
distributions.   Thus  at  any  step,  a  potential  new  variable 
may  be  tested  for  its  significance  if  allowed  to  enter  the 
system.   Among  those  eligible  to  enter,  the  most  significant 
(in  terms  of  the  F  statistics  being  formed)  is  selected. 
Then  those  variables  entered  at  previous  steps  are  tested 
to  determine  whether  their  presence  remains  significant. 
Again,  statistics  are  used  as  the  criteria  for  deletion. 
When  no  variables  can  either  enter  or  leave,  the  process 
terminates.   Stepwise  regression  has  the  ability  to  look 
ahead  only  one  variable  at  a  time,  and  thus  cannot  guarantee 
an  optimum  solution. 

One  of  the  most  curious  characteristics  of  stepwise  re- 
gression as  it  is  used  in  the  packaged  programs  available 
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.  (specifically  the  BMD  and  SPSS  regression  packages)  is  the 
set  of  critical  points  for  the  F-statistic  used  by  the  com- 
puter as  criteria  for  entrance  of  variables.   In  order  to 
reduce  core  requirements  these  packaged  programs  allow  a 
single  value  from  the  F  table  to  be  used  as  the  criterion, 
despite  the  change  in  degrees  of  freedom  required  each  time 
a  new  F-statistic  is  formed.   Thus  what  may  be  a  test  of 
significance  at  the  level  a  for  one  F-statistic  with  p  and 
q  degrees  of  freedom  will  not  remain  so  for  a  new  F-statistic 
with  r  and  s  degrees  of  freedom.   As  a  result,  there  is  no 
easy  way  to  control  the  actual  level  of  significance  of  each 
test  performed  (much  less  the  overall  level  of  significance 
of  the  final  combination  chosen  with  the  multiple  tests). 
Further,  the  default  values  (for  entering  a  variable)  in 
both  the  SPSS  and  BMD  regression  packages  are  set  at 

F   ...   ,=0.01.   While  these  may  be  changed  by  the  user,  many 
critical  -         &    j  j 

users  are  unaware  of  the  problem  and  rely  on  the  default 
value.   This  default  value  is  not  one  which  most   users  would 
initially  choose,  as  for  most  F  tests  this  would  correspond 
to  a  significance  level  close  to  one.   (In  fairness  to  step- 
wise regression  it  should  be  stated  that  the  problem  is  with 
the  packaged  programs,  not  with  the  method  itself.) 

Two  advantages  to  stepwise  regression  are  worth  noting. 
The  first  is  that  it  only  needs  to  look  at  a  small  subset  of 
the  total  number  of  combinations  before  terminating,  and 
therefore  relatively  large  problems  become  computationally 
feasible.   The  second  is  that  when  the  underlying  assumptions 
are  met  the  results  of  its  screening  seem  to  be  generally 

quite  good. 
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There  are  a  number  of  disadvantages  to  stepwise  regres- 
sion.  The  first  is  that  it  is  extremely  difficult  to  "see 
into"  the  method  and  understand  what  it  is  doing  at  any  step 
The  computer  printout  is  confusing,  and  many  users  may  be 
only  dimly  aware  of  what  is  happening.   Consequently,  the 
results  obtained  may  often  be  unsatisfactory,  as  the  user 
delegates  too  much  analysis  to  the  computer  program. 

A  second  disadvantage  is  that  of  the  underlying  distri- 
butional assumptions.   Robustness  to  departures  from  the 
normality  assumption  becomes  an  issue,  as  well  as  the  pre- 
viously stated  problem  of  simultaneous  inference. 
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IV.   SCREENING  OF  VARIABLES 

The  method  of  principle  components  [Refs.  2  and  5]  ap- 
plied to  the  antecedent  variables  provides  a  platform  for 
introducing  the  screening  methods  presented  in  this  paper. 
Further,  the  growth  of  the  multiple  correlation  coefficient 
when  the  principle  components  are  used  as  antecedent  vari- 
ables is  easily  developed  and  serves  as  a  rough  standard 
for  the  kind  of  growth  that  may  be  available  using  the  orig- 
inal variables . 

The  mathematical  structure  is  useful  in  that  the  compo- 
nent: variables  are  orthogonal  (uncorrelated) ,  and  their 
variances  are  readily  obtainable,  as  are  their  correlations 
with  the  response  variable.   The  matrix  of  eigenvectors 
serves  to  expose  those  antecedent  variables  which  exert 
most  influence  over  the  orientation  of  the  original  data. 
Thus  it  seems  reasonable  that  useful  screening  methods  can 
be  obtained  by  first  finding  the  important  principle  com- 
ponents, and  then  the  important  original  variables  that  in- 
fluence these  components. 

General  developments  of  the  method  of  principle  components 
are  readily  available  (see  for  example  [Refs.  2  and  5]).   The 
salient  properties  used  herein  are  presented  below  and,  for 
sake  of  immediate  reference,  developed  in  Appendix  A. 

The  rotation  of  the  vector  of  antecedent  variables  x  to 
the  principle  components  vector  v  may  be  expressed  as  follows: 


v  =  W'x 


(4.1) 


where  the  columns  of  W,  (w, ,w2,...,w  ),  are  the  eigenvec- 
tors of  C  (Eqn.  (2.4)).   The  covariance  matrix  of  v  is  di- 
agonal with  the  variances  being  the  eigenvalues.   The 
covariance  of  a  typical  v.  with  the  response  y  can  be  cal- 
culated as 


CovCy.v^  =  ECyv^  =  wjE(yx)  =  w[s 


(4.2) 


and  hence  the  correlations  are 

w!  s  ., 

l  1 


y>vi 


/X7c 
i   yy 


.£,      w.  ./C77 


y5x. 


(4.3) 


where  w. -  are  the  elements  of  W  and  C   the  elements  of  C. 
When  the  principle  components  are  considered  as  being 
the  antecedent  variables,  it  is  an  easy  task  to  chart  the 
square  of  the  multiple  correlation  coefficient  R2  as  a  func 
tion  of  the  number  of  variables  q.   One  need  only  order  the 
(v, ,...,v  )  according  to  the  magnitudes  of  their  correla- 
tions with  y  (given  by  Eqn.  (4.3)): 


r     >  r     >  • 

y>v,    y,v9 


>  r' 


y,y 


p 


(4.4) 


It  follows  from  (2.7)  that 


7  q      9  q      1 

R2   =  .£-.  r2     =  .Z,  ^ 
pc    i=l   y,v.    i=l  X 


.  Z .  w  .  .  /C.  .  r 

i=i  ji  Ji    y>xj 


i   <*   i 

c    i=l  X. 

yy      i 


. E,  w . -  s  . 
j=l   ji   j 


(4.5) 
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A.   FIRST  SCREENING  METHOD  (Ml) 

Our  goal  is  to  select  the  best  subset  of  size  q  from 
(x,  ,...,x  ),  where  maximization  of  R2  is  the  criterion. 
Tb.e  best  q  principle  components  are  already  in  hand  from 
(4»4).   It  seems  reasonable  to  try  to  march  this  vector 
with  the  "closest"  q-dimensional  flat  in  x-space.   First 
we  will  introduce  some  notation:   the  new  rotation  to  the 
subset  of  q  principle  components  may  be  denoted 


wn,  ••••  wpl 


w,  ,  •  •  •  •  w 
lq'        pq 


x 


(4.6) 


That  is, 


v  =  W*     x 
qxl    qxp   pxl 


(4.7) 


Note  that  W*  consists  of  q  eigenvectors,  each  of  which  is 

complete  (that  is,  the  deletion  is  among  the  eigenvectors, 
not  across  them) . 

We  now  have  q  of  the  principle  components  represented 

as  linear  combinations  of  all  p  of  the  {x.}.   The  second 

step  is  to  reduce  the  number  of  x.  required.   Note  that  the 

act  of  dropping  some  of  the  x.  may  be  viewed  as  the  removal 

of  the  corresponding  columns  of  W* '  and  their  replacement  with 
columns  of  zeroes.   Thus 


v**  =  W**'   x 

qxl    qxp   pxl 


(4.8) 
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Finding  the  "closest"  subset  mentioned  above  will  be 
accomplished  by 

min  E  | |v*-v**| | 2  (4.9) 

where  the  operator  E  refers  to  averaging.   Such  a  minimiza- 
tion must  take  place  for  each  q=l,...,p.   The  solution  to 
this  problem  will  be  referred  to  as  method  Ml. 

In  terms  of  the  elementary  quantities,  (4.9)  may  be 
written  as 

E  ||v*-v**||2  =  E  | | (W*'-W**')x| |2 


=  E 


i  =  l 


.1,  (w* ■  -w**)x. 

3  =  1    ji   ji-'  j 


(4.10) 


q    P    P 
.X..  .E1  ,L  (w*.-w**)(w,  .-w**)E(x.x1)2 
i  =  l  j=l  k=l  v  ji   ji   ki   k.i^  v  j    kj 

In  order  to  describe  the  minimization  process  of  (4.9), 
let  S  be  a  set  of  q  subscripts  of  the  variables  considered 
for  inclusion  in  the  regression,  so  that  its  complement,  S, 
contains  subscripts  of  those  variables  to  be  excluded.   Then 
(4.10)  becomes 


q 

2  =   E    Z    I   w .  .  W-,  .  C  .  i  , 
i=l  jeS  keS   J1  kl  Jk 


(4.11) 


Further,  let  Q  =  (Q, , . . . ,Q  )  where 

1  if  i  S 
0  if  i  S 

Finally  let   Z    be  defined 
pxp 


(4.12) 
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(Z),  .  =  .  Z,  w.  -w,  - 
v  Jkj         i=l   ji  ki 


(4.13) 


It  follows  that  (4.10)  can  be  represented  as 


P   P 


(4.14) 


E||v*-v**||2  =   Z    I      Z.,C,  =  .1,  ,  Z,  Z.,  C,  (1-Q-)  (1-Q,  ) 

jeS  kcS  jk  jk   J  =  lk=1   JkJk    J    VkJ 


The  closest  v**  may  be  bound  by  forming  all  (q)  subsets  S  of 
size  q,  then  computing  (4.14)  for  each,  and  choosing  the 
smallest. 

It  is  useful  to  express  this  algorithm  in  another  way. 
From  (4.10)  we  obtain 


E| |v*-v**| |2  = 


q  p   p 

.Z,  .Z,  ,Z1  fW*,-W**M..(W*,-W**,).vC1 

1=1  j=l  k=l  w         ij  lk  ik 


(4.15) 


P   P 

j=i  k=i  C(w*'-w**i)'(w*«-w**'));j.kc:.k, 


Since 


(W**1) 


ij 


rw*n .  -Q. 


(4.16) 


we  see  that 


fw*1  -w**n . 


0     if  x.  is  to  be  included 
J 

w. .   if  x.  is  to  be  excluded 


(4.17) 


yielding  a  form  that  is  easily  computerized. 

B.   SECOND  SCREENING  METHOD  (M2) 

The  rationale  for  the  second  method  begins  with  the  ob- 
servation that  when  (4.5)  is  used  to  calculate  R2  fat  q=p) , 

^    j  max  v  i  r/ » 


R2 
max 


yy 


i  =  l  A. 

l 


.  En  w. .  s . 

J  =  l   Ji   JJ 


(4.18) 
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one  can  decompose  the  inner  summation  into  two  parts;  one 
associated  with  the  variables  to  be  kept  and  one  with  the 
variables  to  be  deleted.   The  former  is  indexed  by  the  set 
S  and  the  latter  by  its  complement  S.   Letting 


w.  -  s  . 


^    /c— T7       J1  J 
yy  1 

expression  (4.18)  can  be  written 


i>  j  =  l,  •  •  •  iP 


(4.19) 


R2 

max 


p  r  -] 

=  -E,    .%■„    t.  .  +  .E~-  t-  • 
i  =  l    jeS   ij    jeS   ijj 


(4.20) 


=  R*2  + 


E,  {(  E  t.  .)  (  2  t.  0  +  (  E  t.  -)2K 
-'     S   ^    S   1J       S   ^ 


i=l 


A  2 

This  serves  to  define  R   for  each  set  S  of  indices  of  vari- 
ables to  be  kept.   It  seems  reasonable  that  a  set  S  that 
maximizes  R*2  should  define  a  good  set  of  variables  to  re- 
tain.  The  process  of  determining  the  set  S  will  be  referred 
to  as  method  M2 . 

The  maximization  problem  may  be  couched  nicely  in  mathe- 
matical programming  notation  as  follows: 


*2      r 

maximize  R   =  . E, 

i=l 


.E,  t. .Q. 
3=1   iJxJ 


(4.21) 


.l1    Q.  =  q 
j=l  xj 


subject  to 

The  expression  also  lends  itself  well  to  conputerization. 

As  in  method  Ml,  we  must  generate  all  possible  Q  vectors 
and  compute  all  possible  candidates  for  (4.21),  ultimately 
choosing  the  largest  for  each  value  of  q. 

It  is  interesting  that  this  method  can  be  developed  from 
another  point  of  view.   From  (2.7)   the  square  of  the 
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multiple  correlation  may  be  expressed  as 

R2    =  —  s'C"1s  =  —Trace  [ss'C-1].        (4.22) 

maX     Cyy  Cyy  J 

In  hopes  of  finding  a  viable  screening  method,  we  again  con- 
sider a  subset  S  of  subscripts,  and  limit  the  above  trace 
computation  to  elements  of  S.   To  be  more  explicit,  let 

D  =  Diag(Q1,.. .,Q  ).  (4.23) 

Then  let  us  consider  screening  the  vector  s  with  D  by  looking 
at  the  pxp  matrix  Dss'D.   We  may  think  of  this  as  a  matrix 
which  has  been  stamped  with  a  grid  so  that  non-zero  elements 
can  be  found  only  in  those  rows  and  columns  both  of  whose 
indices  belong  to  S.   Comparison  of  all  possible  "stamps" 
of  order  q  allows  us  to  choose  that  combination  which  maxi- 
mizes the  trace  in  (4.22)   Equation  (4.22)  becomes  [Ref.  9] 

(4.24) 
R*2  =  -^—  Trace(Dss'DC_1)  =  -i-  Trace (ss ' DC_1D) . 

cyy  cyy 

However,  we  can  show  that  this  is  equivalent  to  method 
M2  as  follows.   Let 


A     =  E(vv')  =  E(W'xx'W)  =  W'E(xx')W  ■  W'CW 

pxp     v    *  y  *  y        J 

=    Diag(X1,...,Xp)  (4'25) 

and  notice  that 

C  =  WAW    and  C_1  =  WA_1W'  (4.26) 

properties  that  follow  from  the  orthogonality  of  W.   It  fol- 
lows that 

Trace[ss'DC'1D]  =  Trace [ss » DWA" 1W D] .       (4.27) 
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Since 


0*"V)rj  =  kIl  Wrkwjk/Xk  C4.28) 


it  follows  that 

P 


(DWA-VD)rj  -  k£l  wrkw.kQrQ./Ak.  (4.29) 

Since  (ss').   =  s.s  ,  (4.27)  becomes 

_i  P       P       P 

Tracefss'DW      XWD]    =      E,     .E.    ,  £,    s.s   w  ,w.,Q   Q-A, 
L  J         r=l   j=l   k=l      j    r   rk   jkxrxj      k 

(4.30) 
P         P         P  P  P  2 

yy   r=l  j  =  l  k=l   kj  krxrxj     yy  k-1  lj=1   kjxjj 

using  (4.19).   Comparison  of  this  with  (4.2Q)  completes  the 
proof.   Hence  method  M2  may  be  thought  of  as  an  attempt  to 
approximate  the  inverse  of  a  qxq  minor  of  C  by  a  stamped  ma- 
trix DC-1D. 
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V.   RESULTS  AND  CONCLUSIONS 

A.   RESULTS 

As  was  mentioned  previously,  the  multiple  correlation 
coefficient  is  a  convenient  criterion  for  judging  the  ef- 
fectiveness of  the  combinations  of  variables  selected  by 
the  screening  methods  under  discussion.   Thus  the  results 
of  the  screening  methods  can  best  be  summarized  with  the 
graphs  of  multiple  correlation  versus  the  number  of  variables. 

The  first  test  matrix,  as  mentioned,  was  designed  to 
expose  a  weakness  in  stepwise  regression.   Figure  1  indi- 
cates this  quite  well.   The  printout  of  the  SPSS  regression 
program  may  be  reviewed  in  Table  IV.   Variable  x,  was  se- 
lected first  because  it  was  highly  correlated  with  y.   Step- 
\tfise  regression  then  chose  variable  x_.   Variable  x,  remained 
significant,  and  was  left  in  the  equation.   Stepwise  regres- 
sion then  selected  variable  x? ,    found  it  significant,  and 
terminated.   It  never  observed  the  pair  x?,x.,  alone,  which 
in  this  case  was  a  much  better  pair  than  the  one  stepwise 
regression  chose  at  step  two  (x,  ,x_) .   Further,  had  stepwise 
regression  chosen  X-,x_,'it  would  not  then  have  selected  x,  . 
This  can  be  seen  in  Table  IV.   The  reason  for  this  is  as 
follows.   The  square  of  the  multiple  correlation  (R2)  of 
variables  x, ,x_  is  a  great  deal  smaller  than  R2  for  x?,x~. 
Thus  when  stepwise  regression  considered  adding  x?  to  the 
set  x-^x.,,  the  sizable  increase  in  R2  caused  it  to  accept 
the  new  variable.   On  the  other  hand,  when  it  was  given  the 


26 


CORRELATION  COEFFICIENTS 

A  VALUE  OF  99.00000  IS  PRINTED 

IF  A  COEFFICIENT  CANNOT  3E  COMPUTED, 


Y  XI  X2  X3 

Y  1.00000  0.60000  0.50008  0.50000 

XI  0.60000  1.00000  0.50000  0.50000 

X2  0.50000  0.50000  l.OOOOu  0.10000 

X3  0.50000  0.50000  0.10002  I. 00000 


DEPENDENT    VARIABLE..  Y 

VARIABLE(S)     ENTERED    ON    STEP    NUMBER       1..  X3 

XI 
X2 


MULTIPLE    R  0.70238 

R    SQUARE  0.49333 

STANDARD    ERROR  0.73465 


VARIABLES    IN    THE    EQUATION! 

VARIABLE  B  BETA  STD   £RROR    B  F 

X3                          '  0.33333  0.33333  0.12363  7.26? 

XI  0.26667  0.26667  w. 14210  3,522 

X2  0.33333  0.33333  2.12368  7.263 

(CONSTANT)  0.33333 

ALL    VARIABLES    ARE     IN    THE    EQUATION 

DEPENDENT    VARIABLE..  Y 

VARIABLE(S)     ENTERED    ON    STEP    NUMBER       1..  XI 

MULTIPLE    R  0.60000 

R     SQUARE  O.36J00 

STANDARD    ERROR  0.80829 


VARIABLES    IN    THE    EQUATION   

VARIABLE                                B  BETA               STO   ERROR    B                        F 

XI                                             0.60000  0.60000                   0.11547                 27.00C 
(CONSTANT)                        2.00000 


Tabic    IV, 
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VARIABLE(S)    ENTERED    ON    STEP    NUMBER      2..  X3 


MULTIPLE    R 
R    SQUARE 
STANDARD    ERROR 


0.64291 
0.41333 
0.76207 


VARIABLE 

XI 

X3 

(CONSTANT) 


VARIABLES    IN    THE    EQUATION    

B  BETA  STD    ERROR    B 


0.46667 
0.26667 
1.33333 


0.46667 
0.26667 


0.12901 
0.12901 


13.085 
4.273 


DEPENDENT    VARIABLE..  Y 

VARIABLE(S)     ENTERED    ON    STEP    NUMBER       1.. 


X3 
X2 


MULTIPLE    R 
R    SQUARE 
STANDARD    ERROR 


0.67420 
0.45455 
0.75410 


VARIABLE 

X3 

X2 

(CONSTANT) 


VARIABLES    IN    THE    EQUATION    

B  BETA  STD    ERROR    B 


0.45455 
0.45455 
0.45455 


0.45455 
0.45455 


0.10827 
0.10827 


17.62  5 
17.625 


Jr         *•-         -V  J,-  J* 


■Sc      =fe       *: 


*      #      * 


VARIABLE(S)     ENTERED    ON    STEP    NUMBER       2.. 


XI 


MULTIPLE  R 
R  SQUARt 
STANDARD  ERROR 


0.70238 
0.49333 
0. 73465 


VARIABLE 

X3 
X2 
XI 

(CONSTANT) 


VARIABLES    IN    THE    EQUATION    

B  BETA  STD    ERROR    B 


0.33333 
0.33333 
0.26o67 
0.33333 


0.33333 
0.33333 
0.26667 


0.12365 
0.12368 
0.i4210 


7.263 
7.263 
3.522 


ALL  VARIABLES  ARE  IN  THE  EQUATION 


Table  IV 


continued , 
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DEPENDENT  VARIABLE..     Y 

VARIAELE(S)  ENTERED  ON  STEP  NUMBER   3.. 


X2 


MULTIPLE  R 

R  SQUARE 
STANDARD  ERROR 


0.70238 
0.49333 
0.73465 


VARIABLE 

XI 

X3 
X2 

(CONSTANT) 


VARIABLES  IN  THE  EQUATION  

B  BETA      STD  ERROR  B 


0. 26667 
0.33333 
0.33333 
0.33333 


0.26667 
0. 33333 
0.33333 


0.1 42 10 
0. 12368 
0.123t>8 


7.2t)3 
7.263 


MAXIMUM  STEP  REACHED 


Table  IV.   continued, 
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opportunity  to  add  x,  to  the  pair  x2,x.-,  there  was  not  suf- 
ficient improvement,  and  x-  was  not  added.   Since  stepwise 
regression  missed  the  best  pair,  it  did  not  terminate  until 
all  three  were  included,  whereas  it  could  have  terminated 
with  two  variables  had  it  found  the  best  pair. 

Note  also  that  while  method  Ml  falls  into  the  same  trap 
as  stepwise  regression,  method  M2  selected  the  same  combina- 
tions as  did  total  enumeration  for  each  q=l,2,3.   The  reader 
will  also  observe  that  the  principle  component  multiple  cor- 
relation curve  serves  quite  well  in  this  graph,  and  the  two 
that  follow,  as  a  standard  with  which  the  other  methods  may 
be  compared. 

It  is  interesting  that  this  curve  and  the  total  enumera- 
tion curve  are  generally  very  close  in  the  cases  presented 
here,  and  in  fact  cross  each  other  in  the  second  case.   This 
observation  is  useful  since  in  many  applications  the  total 
enumeration  solution  is  unavailable. 

Figure  2  indicates  the  results  of  the  various  methods 
with  the  second  test  matrix.   Nore  that  while  the  R2  curve 
for  method  M2  runs  well  with  the  total  enumeration  and  step- 
wise regression  curves  (identical  curves  in  this  case),  the 
curve  for  method  Ml  runs  consistently  lower  throughout  the 
entire  midrange. 

Figure  3  presents  the  results  of  the  screening  methods 
on  the  third  test  matrix.   This  is  of  particular  interest 
because  the  results  should  be  useful  in  the  study  cited  ear- 
lier.  Note  again  that  the  curve  for  M2  runs  quite  strongly 
with  the  curve  for  stepwise  regression,  but.  that  the  curve 
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Figure  2.   Test  Matrix  Two  Results 
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for  Ml  falls  consistently  below  the  others.  Throughout  the 
midrange  of  all  three  test  cases,  the  curve  for  Ml  is  about 
two-thirds  of  the  total  enumeration  curve. 

Because  the  second  method  shows  greater  promise  as  a 
screening  device,  more  detailed  results  on  its  performance 
will  be  presented.   Unlike  stepwise  regression,  this  method 
is  capable  of  ordering  all  combinations  of  size  q  according 
to  their  R*2  values.   (Stepwise  regression  will  typically 
only  look  at  one  or  two  of  the  combinations.)   Thus,  Table 
V  contains  a  summary  of  the  combinations  selected  by  the 
second  method,  M2 . 

From  this  table,  the  graph  in  Figure  4  was  constructed, 
giving  R*   as  a  function  of  the  number  of  variables.   This 
curve  seems  fairly  typical  of  what  can  be  expected.   Note 
that  as  the  number  of  variables  allowed  to  enter  increases, 
R"2  initially  increases.   At  some  point  the  curve  will  peak, 
then  decrease  to  the  multiple  correlation  value  of  the  en- 
tire set  of  variables.   This  phenomenon  can  be  explained  in  the 
following  way.   The  value  of  T. .  (see  4.19,  4.20)  may  be  of 
either  sign.   When  the  number  of  variables  is  small,  it  is  fair- 
ly likely  that  some  combination  of  variables  will  be  such  that 
the  t. .'s  will  combine  with  the  same  sign,  so  that  when  squared 
and  summed  again  the  value  may  get  quite  large.  (In  the  three 
examples  used  in  this  paper,  the  value  typically  exceeded  one.) 
However,  as  the  number  of  variables  increases,  it  becomes  much 
more  likely  that  the  t-.'s,  differing  in  sign,  will  conflict 
with  each  other  and  reduce  the  overall  values  being  squared 
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0=1       P 
Vbles   R*^ 

0=2 
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2 

R*^ 

0=3 
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*1      ,5Qk 

X1x2 

.521; 

x1  x3x^ 

1.13 

x1x2x3xl| 

1.002 

x2      »015 

x1x3 

.965 

A.-1  j\^A,_ 

.871 

X^XyX^Xg 

.676 

x3      .215 

x1xl|. 

.741 

aSx2xi|. 

.650 

*1  X2X3X5 

.565 

x^     .115 

X1X5 

.14-98 

X4X  x- 
1    3  5 

.627 

X1X2Xl4-X5 

.14.26 

x^     .k.30 

X2X3 

.196 

*1*!l*5 

.550 

X2X3X1^X5 

.391 

X2\ 

.100 

x2¥5 

.518 

X2X5 

.14-78 

X1X2X5 

.14-72 

x3\ 

.333 

X2X3X5 

.14-06 

X3X5 

.393 

x3xl4.x5 

.351 

V* 

.kko 

X2X3X^ 

.296 

Table  V.   Combinations  Chosen  by  Method  2. 


, Q 


Figure    4.       Graph    of   R*2    vs.    Q    for   Method    2 
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and  summed.   The  curve  in  Figyre  4  seems  typical  of  what 
may  be  expected.   Thus  at  best  th^se  values  are  pseudo- 
correlations,  which  have  been  _-:. .n  to  rank  well  with,  but 
not  predict,  the  ranking  of  the  actual  multiple  correlation 
coefficients.   As  a  result,  this  method  cannot  guarantee  op- 
timal selections. 

Table  VI  gives  the  rankings  of  total  enumeration  and  of 
method  M2  for  each  combination  of  size  q=3  and  q=4  from 
test  case  two.   In  this  way,  the  results  of  the  method  may 
be  more  fully  evaluated  than  to  consider  only  the  "best" 
selection  made  at  each  step.   Using  the  rankings  given  in 
the  table,  Spearman's  Rank  Correlation  test  [Ref.  10]  was 
applied;  at  a=.l  the  rankings  of  total  enumeration  were 
significantly  correlated  with  the  rankings  of  M2  in  both 
cases.   This  lends  credence  to  the  second  screening  method, 
and  also  indicates  a  convenient  property  of  this  method;  the 
capability  to  rank  all  combinations  at  each  step. 

Finally  some  comments  on  the  use  of  principle  components 
are  in  order.   It  appears  to  provide  a  useful  standard  of 
comparison  for  the  other  methods  presented.   Indeed,  the 
method  may  be  of  more  direct  use  in  some  applications,  es- 
pecially when  researchers  find  that  p>N.   In  such  cases  the 
coefficients  B  may  not  be  directly  estimable,  but  any  ver- 
sion  of  least  squares  will  result  in  c=0.   Principle  compo- 
nents may  be  useful  for  preliminary  screening  so  that  there 
are  enough  degrees  of  freedom  N-q  to  obtain  an  acceptable 
estimate  of  variance  of  residuals. 
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0=3  Princ.   Components  Multiple  Correlation 


Variables 

R*2 

Rank 

R2 

Rank 

vyk 

1.126 

1 

.14.92 
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1    2   3 
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•5k$ 

1 

"SVfc 

.650 

3 

•W 
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X1X3X5 

.627 

k 

.W 

3 

X1\X5 

.550 

5 

A31 

6 

x2\x5 

.518 

6 

.278 

9 

X1X2X5 

.14-72 

7 

•ij-37 

5 

W5 

.14.06 

8 

.301 

8 

W5 

.351 

9 

.317 

7 

w 

.296 

10 

.223 

10 

Qr=k 

Sp< 

carman's    p    = 

=    0.93 

x1  X2x3xi^ 

1.000 

1 

.593 

1 

xlX3XliX5 

.676 

2 

.14-98 

3 

X1X2X3X5 

.565 

3 

.514-9 

2 

X1X2X]4.X5 

•14-26 

14- 

.^78 

k 

Vfk*s 

.391 

5 

.329 

5 

Spearman's  p  =  0.80 

Table  VI . 
Compared  Rankings  of  the  Second  Method  With 
Enumeration 
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B.   CONCLUSIONS 

On  the  basis  of  the  results  just  presented,  a  number  of 
observations  concerning  the  screening  methods  under  inves- 
tigation in  this  paper  may  be  made. 

The  first  method  (Ml)  does  not  appear  to  perform  par- 
ticularly well.   Its  selections  were  consistently  worse 
than  those  made  by  the  other  methods.   It  may  be  that  its 
disappointing  performance  is  caused  by  the  weak  correlation 
of  the  major  principle  components  with  the  response  variable. 
As  was  seen  in  the  three  examples  presented,  this  method 
does  not  in  general  even  approach  the  optimal  combinations, 
and  thus  shows  little  promise  per  se  as  a  screening  technique 

Method  M2  seems  to  be  a  reasonable  approach  to  the  prob- 
lem.  V.Tiile  it  will  not  in  general  make  optimal  selections. 
in  the  examples  used  here  it  has  done  quite  well. 

The  method  has  several  advantages  which  are  worthy  of 
mention.   The  first  is  that  after  an  initial  investment  in 
obtaining  the  eigenvalues  and  eigenvectors  of  C  (roughly 
equivalent  to  inverting  it,  in  time  spent),  the  amount  of 
time  required  to  examine  any  potential  combination  is  very 
small.   As  a  result  of  this,  enumeration  of  the  combinations 
becomes  feasible  even  when  the  number  of  variables  is  fairly 
large.   Appendix  C  contains  some  remarks  concerning  an  algo- 
rithm which  will  enumerate  quite  efficiently,  and  which  was 
used  in  the  FORTRAN  program  presented  there.   This  program 

*  2 

outputs  the  largest  value  of  R'  ,  and  the  combination  that 
produced  it,  for  each  value  of  q. 
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It  is  worth  noting  that  the  manner  in  which  M2  screens 
variables  is  much  more  readily  apparent  than  that  of  stepwise 
regression.   The  user  is  aware  of  the  process  by  which  vari- 
ables are  screened,  and  is  more  able  to  make  intelligent  use 
of  it. 

C .   SUMMARY 

The  results  presented  here  are  at  best  tentative,  since 
only  three  test  cases  are  presented.   Certainly  a  wider  spec- 
trum of  test  cases  is  necessary  to  establish  conclusive  re- 
sults for  the  methods  presented  in  this  paper.   Neither  stepwise 
regression  nor  method  M2  is  optimal,  but  both  remain  as  vi- 
able competitors  which  are  useful  as  screening  devices. 

Because  of  the  success  of  M2 ,  we  are  led  to  consider  the 
possibility  of  other  methods  of  approximating  the  inverses 
of  the  minors  of  C,  which  may  be  as  efficient  (with  computer 
resources)  ,  as  those  presented  yet  produce  results  which 
compare  more  favorably  with  total  emameration. 
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APPENDIX  A:   PRINCIPLE  COMPONENTS 

The  structure  for  principle  components  [Ref.  11]  arose 
from  a  desire  to  find  that  location  and  orientation  of  axes 
that  the  variance  of  the  data  swarm  about  them  is  minimized. 
The  necessary  location  of  axes  follows  from  a  linear  algebra 
theorem  which  states  that  a  sum  of  squares  centered  about  an 
arbitrary  point  is  minimized  when  that  point  is  the  cen- 
troid.   Thus  it  becomes  convenient  to  standardize  the  data 
about  their  means.   In  order  to  define  the  desired  orienta- 
tion, let  us  first  agree  to  the  following  conventions. 

Let  the  data  swarm  under  consideration  be  in  p-space. 
Let  the  X  matrix  contain  the  p  coordinates  of  the  N  obser- 
vations, and  let 

1   N  1   N 

X  .  »  £■   -£n  x.  .   and  si  =  i  .E,  (x..-x  -)2-   (A.l) 
.  j    Mi  =  l   ij        j    N  l = 1  *•  l  j   .3      ^    ' 

Then  let  the  X  matrix  have  entries 

(X)  .  .  =  ((x.  .-x  O/s.)  .  (A. 2) 

That  is,  let  the  X  matrix  contain  the  p  coordinates  of  the 
N  observations  after  the  variates  have  been  standardized  to 
zero  mean  and  unit  variance.   (Note  that  unit  variance  is 
not  necessary  to  what  follows.)   Then  it  follows  that  the 
correlation  matrix  is 

C     =  \  X'X.  (A.  3") 

pxp     N  v    ' 

The  rotation  itself  may  be  denoted 
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v  =  W'x  (A. 4) 

pxl 

where  v  =  (v.  ,v-  , .  .  .  ,v  )  '  ,  'x  =  (x,  ,x_  , .  .  .  ,x  )  '  ,  and  W    is 

a  matrix  column-oriented  vectors,  each  of  which  provides  the 
coefficients  for  transforming  the  x  vector  into  one  of  the 
component  directions. 

For  p=2,  this  may  be  displayed  graphically  (see  Figure  5). 

As  stated  previously,  the  goal  is  to  minimize  the  variance 
about  the  component  axes  by  choosing  the  matrix  W.   However, 
one  may  do  this  neatly  by  setting  W=0_,  and  thus  avoid  the  real 
problem.   To  obtain  a  tenable  solution  we  may  constrain  the 
problem  by  requiring  that  the  norm  is  one:   w'w=l.   The  vari- 
ance about  the  component  directions  may  be  written 

S2  =  E(w'x)2  -  E(w'xx'w)  -  w'E(cc')w 

=  w'Cw.  (A'5) 

Because  minimizing  the  variance  about  an  axis  is  equiva- 
lent to  maximizing  the  variance  along  that  axis,  and  because 
S2  represents  the  variance  along  the  axes  v, ,  our  problem 
will  be  to  maximize  S2.   To  show  that  this  is  true,  refer  to 
Figure  6.   Note  that  to  minimize  the  variance  in  the  v~  di- 
rection we  must  maximize  along  the  (orthogonal)  v,  direction. 
Orthogonality  will  be  shown  later. 

Thus  the  problem  of  finding  the  direction  V,  may  be 
written 

maximize  w'Cw    subject  to  w'w  =  1  (A. 6) 

where  w  is   an  arbitrary  column  of  W.   This  problem  may  be 
solved  conveniently  by  the  use  of  LaGrange  multipliers.   The 
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A       A 


A       A 


v1=x1cos8+x„sin9 
=  wux1+w12x2 


¥?=x1sin9+x0cosB 


=  W21X1+W22X2 


Figure  5.   Principle  Component  Rotation- 


Figure  6.   Maximizing  the  Variance  Along  v. 


41 


LaGrangian  is 


$x   =   w'Cw  -  X(w'w-l) .  (A. 7) 


The  LaGrangian  is  maximized  at  V<}>,=0 


7*,  =  2(C-XI)w  =  0.  (A. 8) 

Then 

(C-XI)  =  0.  (A. 9) 

Now  the  matrix  C  is  real  and  symmetric,  and  will  in  gen- 
eral be  positive  definite;  it  can  be  shown  to  be  at  least 
positive  semi-definite  as  follows:   Let  Z  be  a  non-negative 

vector,  so  that  Z'CZ  ■  Z'X'XZ  =  (XZ)'XZ.   Now  let  Y  =  XZ , 

N 

so  that  (XZ)'XZ  =  Y'Y  =  .£-.  y?  >  0  for  all  z.  in  the  vector 
^      J  i=l/i-  l 

Z. 

As  a  result,  it  can  be  shown  ^Ref.  9]  that  C  is  non- 
singular  with  real,  non-negative  eigenvalues. 

We  require  that  (C-XI)=0.   If  (C-XI)  is  non- singular , 
the  only  solution  to  the  simultaneous  equations  is  w=0_, 
which  violates  the  constraint  w'w=l.   Thus  we  require  that 
(C-XI)  be  singular,  and  it  follows  that 

|C-XI |  =  0.  (A. 10) 

Then  the  set  of  p  solutions  X.  to  this  equation  must  be  the 
characteristic  values  of  the  matrix  C.   Pre-multiplying 
equation  A. 9  by  w'  we  obtain 

w* (C-XI)w  =  w'-0  -  0.  (A. 11) 

Then 

w  '  Cw  -  W '  X I  w  =  X w  '  W  =  X  .  (A .  1  7. ) 
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But  w'Cw  =  S2.   Thus  A  is  actually  the  variance  along  the  com- 
ponent.  Since  we  wish  to  maximize  the  variance,  the  solution 
we  desire  is  the  largest  eigenvalue. 

Because  X    is  an  eigenvalue,  (A. 9)  implies  that  w  is  its 
associated  eigenvector,  and  thus  we  maximize  S2  by  choosing 
the  eigenvector  associated  with  the  largest  eigenvalue  as  the 
direction  v  .   Then  call  this  eigenvector  w  and  its  eigen- 
value A,  . 

The  direction  v~  may  be  found  solving  the  following  equa- 
tion: 

maximize  wJCw,   subject  to  w£w,  =  1  and  w'w,  =  0.   (A.  13) 

The  last  constraint  says  that  we  require  v,  and  v~  to  be 
orthogonal.   The  LaGrangian  is 

*2    =   wkC"k    "    9^wkwk"^    "    2TCw{wk-0)  (A. 14) 


and 


Then 


VcJ>2    -    2Cw,     -    29w,     -    2xw,    »    0.  (A.  15) 


(C-9I)wk  -  TWj  ■  0.  (A. 16) 

Pre-multiplying  by  w£, 

wk(c-ei)wk  -  w,w,  =  o  ■»■  wjL(c-ei)  ■  o  + 


wkCwk  =  9wkIwk  =  8wkwk  =  9  +  s*  s    0 


(A. 17) 


Thus  8  is  one  of  the  eigenvalues  of  the  C  matrix,  and  w,  its 
associated  eigenvector.   Now  prc-multiply  (A. 16)  by  w' 

wJ(C-6I)wk  -  TWiW,  ■  0  +   w*(C-  I)w-  ■  T  ■> 

(A. IS) 
w'Cw,  -  0w'  =  t  =  w'Cw,  . 
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Pre -multiplying    (A. 9)    by  w£, 

wj|.(C-XI)w1   =   w£«0   =    0   ■*■  w^Cw1    -   Aw^w1  -»  w^CWj   =    0.(A.19) 
Then  - 

WlcCwl    =    T    =    °  •  (A .  2  0 )  . 

Note  that  because  x  turned  out  to  be  zero,  the  constraint 
that  forced  the  new  component  to  be  orthogonal  to  the  first 
component  was  not  binding;  that  is,  w,  is  inherently  orth- 
ogonal to  w,  .   This  follows  because  C  is  a  real,  symmetric, 
positive-definite  matrix.   Since  x=0,  (A. 16)  becomes  iden- 
tical in  form  to  [A.  9),  so  that  9  is  an  eigemralue,  w-,  its 
associated  eigenvector.   It  was  shown  above  that  0  =  S*  (8  is 
the  variance  in  the  v~  direction),  so  that  it  follows  natural 
ly  that  6  is  the  second-largest  eigenvalue  (let  8  =  A9) .   Then 
w,  becomes  w7  . 

This  argument  can  be  generalized  so  that  X.,  the  i 

largest  eigenvalue  of  C,  is  the  variance  in  the  i   best 

component  direction  v. ,  and  the  associated  eigenvector  w^  is 

the  set  of  coefficients  mapping  the  x  vector  into  V. .   Now 

let  A    be  a  diagonal  matrix  with  the  ordered  eigenvalues 
pxp         &  & 

(largest  to  smallest)  in  the  diagonal.   The  rotation  we  de- 
sire is  then 

v  -  W'x, 

and  the  correlation  matrix  of  the  v.  is  A. 

i 
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APPENDIX  B:   PROPERTIES  OF  THE  CORRELATION  MATRIX 

Throughout  the  course  of  this  paper,  and  especially  in 
the  numerical  examples,  there  has  been  an  implicit  assumption 
that  if  a  matrix  is  real,  symmetric,  and  positive  definite, 
it  qualifies  as  a  correlation  matrix,  and  that  it  would  be 
possible  to  find  real  data  that  would  generate  such  a  cor- 
relation matrix.   While  this  is  not  overly  difficult  to  show, 
it  is  of  general  interest,  and  is  not  contained  in  many  text- 
books as  a  complete  derivation. 

The  mathematical  statement  of  the  problen  is  to  find  a 

matrix  X,T   such  that  X'X/N  =  C    ,  where  C  is  given.   It  has 
Nxp  '      pxp '  & 

been  shown  previously  that  to  qualify  as  the  product  of  a 
matrix  and  its  transpose,  C  must  be  positive  semi-definite. 
(If  C  has  full  rank,  it  will  be  a  positive  definite  matrix.) 
Further,  by  the  definition  of  correlation  between  two  vari- 
ables, C  must  be  real  and  symmetric. 

To  find  the  X  matrix,  the  initial  step  is  to  find  a  tri- 
angular matrix  T     such  that  T'T  =  C.   It  is  possible  to 

pxp  F 

compute  the  elements  of  the  T  matrix  by  carrying  out  the 
multiplication  (shown  in  this  case  for  a  3x3):  Let  T  be 
upper  triangular,  so  that 


~rll 

0 

0 

T12 

T 
22 

0 

T13 

T 

23 

3.5 

T     T     T 
11    12    13 


0 
0 


22 

0    T 


23 


33 


Cll   C12   C13 

('     C     C 

12  22    23 

C  C  C 

13  23    33 


or 
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T11T11   T11T12 


T   T 
111113 


T   T  T   T   +T   T  T   T   +  ,T   T 

11  12  12  12   22  22  12  1Z   22  23 

TT  TT+TT  TT+TT+TT 

11    13  13    12      23    22  13   13      23    23      33    33 


Thus 


Tn  -  ♦ST 


T12  =  C12  I    ,/Ell 


Cll  C12  C13 

c  c  c 

12  ^22  ^23 

c  c  c 

^13   23  ^33 


T13    C13  /    "^Tl 


22 


/U22  L12  '    Lll 


23 


C23"C12C13  /  Cll 
"VC22"C12  l    Cll 


33 


'13 
'11 


(C 


23 


C,  /  cTTT5 


'12"13 


11- 


L22   ^12  7   ll 


Note  that  at  every  step  it  is  possible  to  solve  for  T 


ij 


in  terms  of  the  C  .  and  only  those  other  T. -  for  which  it  has 

ij  ij 

already  been  possible  to  solve.   This  may  be  done  in  general 
for  a  C  matrix  of  any  size  p. 

Working  backwards,  we  have  that 


T'T  = 

=  C  =  X'X/N 

Then  let 

Z   = 
Nxp 

=   X    T"1 
Nxp  pxp 

Then 


Z'Z  = 


(T"1) 'X'XT'1 


(B.l) 


(B.2) 


(B.3) 


(T"1)CT_1  =  (T'^T'TT"1  =  I 


Let  Z  -  N(0_,  I),  and  wc  obtain  the  following  transformation: 

X  =  ZT.  (B.4) 

Then 

EX'X  =  T'Z'ZT  =  T1  IT  =  T'T  =  C. 
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Thus  having  obtained  a  matrix  Z  containg  N  observations 
on  a  p-variate  standard  normal,  it  is  possible  to  transform 
Z  to  a  new  set  of  N  observations  on  a  p-variate  normal 
(though  no  longer  with  unit  variance)  which  will  generate 
the  desired  correlation  matrix  through  the  judicious  choice 
of  a  triangular  matrix  T.   Thus  the  only  requirements  on  a 
matrix  are  that  it  be  real,  symmetric,  and  positive-definite 
in  order  to  be  a  correlation  matrix  [Ref.  9]. 
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APPENDIX  C:   FORTRAN  PROGRAM  OF  SECOND  SCREENING  METHOD  FOR 
APPLICATION  IN  TEST  CASE  THREE 


Because  of  the  generally  good  results  given  by  the  second 
screening  method  suggested  in  this  paper,  the  FORTRAN  program 
which  was  used  to  apply  it  to  the  third  matrix  is  listed  here. 
The  program  does  the  following: 

1.  It  reads  a  correlation  matrix  (then  reverses  the  last 
two  rows  and  columns,  as  in  this  case  X13  is  to  be  the  re- 
sponse variable)  and  also  reads  a  standard  deviation  vector. 
From  these  it  obtains  the  covariance  matrix.   (Lines  1  to  40, 
beginning  the  count  with  the  dimension  statement.) 

2.  It  calls  a  subroutine  (JACVAT)  which  calculates  the 
eigenvalues  and  eigenvectors  of  the  covariance  matrix.   (Lines 
34  to  50)  . 

3.  The  equation  developed  under  the  first  approach  to 
the  second  screening  method  is  used  to  calculate  the  multi- 
ple correlation  coefficient.   The  printout,  under  the  label 
"R-square  equals,"  and  showing  the  sums  and  squares,  may  be 
used  to  apply  the  screening  method  by  hand.   (Lines  50-73.) 

4.  The  covariance  matrix's  inverse  is  calculated  for 
transfer  to  the  subroutine  USER  in  order  to  use  it  with  the 
second  approach  to  the  second  method.   (Lines  74-90.) 

5.  The  next  part  of  the  program  is  the  "driver  program" 
for  the  subroutine  TWIDDL  (which  provides  the  vector  of  ones 
and  zeroes).   It  provides  the  initial  vector  (Q) ,  and  per- 
forms the  one-zero  exchange.   (Lines  93  to  120). 
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6.  The  TWIDDL  subroutine '"V  an  algorithm  originally 
written  in  Algol  for  the  Ass    -•  ..  •  n  of  Computing  Machinery, 
and  is  listed  as  Algorithm  384  in  the  CACM.   Essentially 

it  forms  every  possible  combination  of  m  ones  and  (n-m) 
zeroes,  but  each  new  vector  requires  only  the  interchanging 
of  two  positions  in  the  previous  vector  [Ref.  12]. 

7.  The  USER  subroutine  uses  the  inverse  of  the  covariance 

matrix,  the  indicator  vector  Q,  and  the  augmented  covariance 

matrix  to  screen  variables  using  the  second  approach.   For 

each  value  of  m,  it  continually  stores  the  largest  value  of 

R2   that  it  has  calculated  to  date,  and  when  m  increments  it 
s  ' 

prints  the  value  of  R2  and  the  •.■;tor  Q  which  produced  it. 

(Using  this  program  as  a  backbone,  it  is  a  straightfor- 
ward 'matter  to  obtain  a  program  which  will  apply  the  first 
method  to  the  data  matrix.) 
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//LAMBELL    JC3     ( 2629 , 1 242 ,RL42 ),' LAMBE LL • ,TI ME  =  3 
//    EXEC    FORTCLG, REGION. G0=70K 
//FORT.SYSIN    OD    * 

DIMENSION    A(14,14),B(13,13),C{13,13),D(13),SD(14) 
1F(  14, 14), Ql 13) 

COMMON     P0,P<13) 

INTEGER    H0L0Q<13) 
•      INTEGER    PO,P,X,Y,Z,DONE,Q 

NEW  =  0 

DO    20    1=1,14 

READ(5, 10IIAI It J) ,  J  =  l »8) 

READ(5,  10) CAC  I, J)  ,  J  =  9,14) 
10       FORMAT* 8(2X,F8. 5) ) 
20       CONTINUE 

DO    30    J=1T14 

TEMP=A( 13, J) 

A(13, J) =A( 14, J) 
30       A(14, J)=TEMP 

DO    40    J=l, 14 

TEMP=A(  J, 13) 

A( J,13j  =  A<  J, 14) 
40       ACJ,14)=TEKP 

WRITE(6,503 
50       FORMATC IX, /////, '        THE    CORRELATION    MATRIX    ISM 

DO    70    1=1 ,14 

WRITE (6, 601 I A< I, J) ,J= 1,14) 
60   FORMAT! IX, //,1X, 14F9.3) 
7  0   CONTINUE 

READ( 5, 80) (SD( I) , 1=1,8) 

READC5, 3G)  (5DC  I)  ,  1=9,14) 
80   F0RMAT(8(2X,F8.5) ) 

DO  90  1=1 ,14 

DO  90  J=l , 14 
90   F(I,J)=ACI,J1*SD(I)*SDC J) 

DO  100  1=1,13 

DO  100  J=l,13 
10  0   B ( I , J )  =  F (  I , J } 

CALL  JACVAT(8,13,1,D,C,13) 

WRITE(6, 1101 
110   FORMAT  I  IX, ////,«   THE  COVARIAiMCE  MATRIX  ISM 

DO  130  1=1,14 

WRITE(6,12GMFU,  J)  ,J=1,14) 
120      FCRMAT{  1X,//, iX, 14F9.3) 
130      CONTINUE 

WRITE(6,1401 
140       FORMAT( IX, ////,'        THE    EIGENVALUES    AREM 

WRITE(6,1501 ( Dl I ) ,1=1,13) 
150       FORMATC  IX,//,  IX, 13F9.3) 

WRITE (6, 1601 
160       FORMATC IX, ///,'     EIGENVECTORS    ARE     IN    THE    CCLUMNSM 

DO    180    1=1,13 

WRITE {6,1 701 IC(  I ,  J  1  ,  J  =  l ,131 
170      FORMATI 1X,//,1X, 13F9.5) 
18  0       CONTINUE 

DO    190     1=1*13 

DC    190    J  =  l  i  13 
190       B(I,J)=C(J,I}*i(14,J)*S0(J)/SQRT(D(Il) 

WRITEC6,2001 
200       FORMAT t IX, ////,'        R-SQUARE    EQUALS M 

DO    220     1=1,13 

WRITE{c,210) CBC  I  ,  J)  ,  J  =  l  ,13) 
210       FORMATC  IX,//,  '        (  ',  1  2 ( F8 . 5 ,  ' + M  ,  F 8. 5, •) ** 2     +M 
220       CONTINUE 

DO    240     1=1,13 

SUM=0.0 

DO    230     J=l ,13 
23  0       SUM=SUU*8(  It  J) 
240       SD( I )=SUM**2 

WRITE (6, 2 501 
250      FORMATC IX,////, •        OR    R-SQUARE    EQUALS*) 

WR  ITf (o,2oO) ( SD(  I  )  ,1  =  1  ,13) 
260       FORMATC IX  , //  ,  IX , 1 2 ( F9  .6  ,  ' * ' ),F9.6) 


50 


270 

280 


310 
320 


330 
340 
350 


370 
400 

410 


10 


20 


30 


40 
50 


60 


70 


60 


SUM=0 
DO  27 
SUM=S 
WRITE 
FORMA 
N=13 
DO  40 
P0=N+ 
PCN+1 
DO  32 
IFU. 
PU)  = 
GO  TO 
P(I  )  = 
CGNTI 
DCNE  = 
DO  34 
IFU. 
QU)  = 
GO  TO 
QU)  = 
CONTI 
CALL 
CALL 
IF  (DC 
Q(X)  = 
Q(Y}  = 
GO  TO 
CALL 
CONTI 
WPITE 
FORMA 
STOP 
END 
SU6RO 
COMMO 
INTEG 
J  =  0 
J  =  J+1 
IF(P( 
IF(P( 
I=J-1 
IFU. 
PU)  = 
1  =  1-1 
GO  TO 
P(  J)  = 
Z  =  l 
X  =  Z 

P(1J  = 

Y  =  J 
RETUR 
IF(  J. 
J  =  J+1 
IF(P( 
K  =  J-1 
I=K 

1  =  1+1 
IF(P( 
P  (I  )  = 
GO  TO 
IF(P( 
Z=P(K 
PCI  )  = 
X=I 

Y  =  K 
P(K)  = 
RETUR 
IFU. 
P(  JJ  = 
Z=P(  J 
PU)  = 


.0 

0 

UM 

(6 

T( 


1  =  1  ,13 
+  SD( I  J 
,230)  SUM 
IX1////1 ' 


GR  R-SW-ARE  EQUALS' ,F12. 9) 


0  M=l,6 

1 

)  = 

0 

GT 

0 
•3 


■-Z 
1  =  1, N 
,N-M) 


GO  TO  310 


I- 
NU 
0 
0 

GT 
0_ 
3 
1 

NU 
US 
TW 
NE 
1 
0 


20 

N+M 
E 

1  =  1,  M 
•  N-M) 

40 


GO  TO  330 


EP(B,Q,M,NEW,F ,HOLDQ,HOLD) 

IDDL(X,/,Z,DONE) 

.EQ.l )  GO  TO  370 


US 
NU 
(6 
T( 


35C 


ER(e,0,M,NEW,F,HCLnQ,HOLD) 

,410)  (riOLOQU )t I=if 13) f HOLD 

IX, 'Q=   fc>'  ,5X,  •  (  '  ,  13  12,  •   )'f5Xf ,R*2=,f E16.7) 


OUTINE  TWIDDL(X»Y,ZfDONE) 

l-JIN      i-0|r-l/C/ 

ER  X,  Y,Z,DGNE,PO, P 


J )  .LE.O]  30  TO  10 
J-l) .NE.O)  GO  TO  40 

LT.2)  GO  TO  30 
-1 


20 


0 


M 


GT.l)  P(J-1)=0 

J) .GT.O)  GO  TO  50 


I)  .NE.O)  GO  TO  70 
-1 

60 
I) .NE.-l)  GO  TO  80 
) 
Z 


-1 

N 

EU.PO)  GO  TO  90 

P(  I  ) 

) 

0 
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X=J 

Y=I 

RETURN 
90   DQNE=1 

RETURN 

END 

SUBROUTINE  USER ( B , Q , M , NEW , HOLOQ ,  HCLO) 

DIMENSION  3(13,13)tC(13il3JfQ(13) ,HUL3.M 13) ,PSUM( 13) 

INTEGER    Q.HOLDQ 

DO    10    1=1,13 

DO    10    J=l,13 
10       C(I,J)=B(  I,J)*Q( J) 

DO    20    I =1,13 

PSUM(I) =0.0 

DO  15  J=l,13 
15   PSUMl  I)=PSUM(  I)  +C(  I,  J) 
2  0   PSUMi I)=PSUM(I)**2 

SUM=0.0 

DO  30  1=1,  13 

SUM=SUM+PSUMJ I ) 
30   CONTINUE 

IF(NEW.EQ.O)  GO  TO  50 

IF(M.NE.NEW)  GO  TO  80 

NEW=M 

GO    TO    60 
50      HOLD=NEW 

NEW=M 
60       IF(HOLD.GT.SUM)     RETURN 

HOLD=SUM 

DO    70    I =1 ,13 

7  0       HOLDQ( I )=Q( I) 

RETURN 

8  0       L=M-1 

WRITE (6, 40)     L, ( HOLDQ( I ) ,1=1 ,13) ,riOLD 
40       FORMAT!  IX,  ,Q=«  ,13  ,5X,  •  (  '  ,  13  12,  '     )',5X,'R^ 
H  0  L  D  -  0 .  0 
NEW=M 
GO    TO    60 
END 
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